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GRAPHICS AND IMAGE PROCESSING SYSTEM 

The present invention relates to a method of and 
apparatus for graphics and image processing. The 
5 invention has particular, although not exclusive, 

relevance to the image processing of a sequence of source 
images to generate a sequence of target images. The 
invention has applications in computer animation and in 
moving pictures . 

10 

Realistic facial synthesis is a key area of research in 
computer graphics. The applications of facial animation 
include computer games, video conferencing and character 
animation for films and advertising. However, realistic 
15 facial animation is difficult to achieve because the 

human face is an extremely complex geometric form. 

The paper entitled "Synthesising realistic facial 
expressions from photographs" by Pighin et al published 

20 in Computer Graphics Proceedings Annual Conference 

Series, 1998, describes one technique which is being 
investigated for generating synthetic characters. The 
technique extracts parts of facial expressions from input 
images and combines these with the original image to 

25 generate different facial expressions. The system then 

uses a morphing technique to animate a change between 
different facial expressions. The generation of an 
animated sequence therefore involves the steps of 
identifying a required sequence of facial expressions 

30 (synthetically generating any if necessary) and then 

morphing between each expression to generate the animated 
sequence. This technique is therefore relatively complex 
and requires significant operator input to control the 
synthetic generation of new facial expressions. 



35 
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One embodiment of the present invention aims to provide 
an alternative technique for generating an animated video 
sequence. The technique can be used to generate 
realistic facial animations or to generate simulations of 
5 hand drawn facial animations. 

According to one aspect, the present invention provides 
an image processing apparatus comprising: means for 
receiving a source sequence of frames showing a first 

10 object; means for receiving a target image showing a 

second object; means for comparing the first object with 
the second object to generate a difference signal; and 
means for modifying each frame of the sequence of frames 
using said difference signal to generate a target 

15 sequence of frames showing the second object. 

This aspect of the invention can be used to generate 2D 
animations of objects. It may be used, for example, to 
animate a hand-drawn character using a video clip of, for 
20 example, a person acting out a scene. The technique can 

also be used to generate animations of other objects, 
such as other parts of the body and animals. 

A second aspect of the present invention provides a 
25 graphics processing apparatus comprising: means for 

receiving a source sequence of three-dimensional models 
of a first object; means for receiving a target model of 
a second object; means for comparing a model of the first 
object with the model of the second object to generate a 
30 difference signal; and means for modifying each model in 

the sequence of models for the first object using said 
difference signal to generate a target sequence of models 
of the second object. 

35 According to this aspect, three-dimensional models of, 
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for example, a human head can be modelled and animated in 
a similar manner to the way in which the two-dimensional 
images were animated . 

The present invention also provides methods corresponding 
to the apparatus described above • 

Exemplary embodiments of the present invention will now 
be described with reference to the accompanying drawings 
in which: 

Figure 1 is a schematic block diagram illustrating a 
general arrangement of a computer system which can be 
programmed to implement the present invention; 

Figure 2a is a schematic illustration of a sequence of 
image frames which together form a source video sequence; 

Figure 2b is a schematic illustration of a target image 
frame which is to be used to modify the sequence of image 
frames shown in Figure 2a; 

Figure 3 is a block diagram of an appearance model 
generation unit which receives some of the image frames 
of the source video sequence illustrated in Figure 2a 
together with the target image frame illustrated in 
Figure 2b to generate an appearance model; 

Figure 4 is a flow chart illustrating the processing 
steps employed by the appearance model generation unit 
shown in Figure 3 to generate the appearance model; 

Figure 5 is a flow diagram illustrating the steps 
involved in generating a shape model for the training 
images ; 
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Figure 6 shows a head having a number of landmark points 
placed over it; 

Figure 7 illustrates the processing steps involved in 
5 generating a grey level model from the training images; 

Figure 8 is a flow chart illustrating the processing 
steps required to generate the appearance model using the 
shape and grey level models; 

10 

Figure 9 shows the head shown in Figure 6 with a mesh of 
triangles placed over the head; 

Figure 10 is a plot showing a number of landmark points 
15 surrounding a point; 

Figure 11 is a block diagram of a target video sequence 
generation unit which generates a target video sequence 
from a source video sequence using a set of stored 
20 difference parameters; 

Figure 12 is a flow chart illustrating the processing 
steps involved in generating the difference parameters; 

25 Figure 13 is a flow diagram illustrating the processing 

steps which the target video sequence generation unit 
shown in Figure 11 performs to generate the target video 
sequence . 

30 Figure 14a shows three frames of an example source video 

sequence which is applied to the target video sequence 
generation unit shown in Figure 11; 

Figure 14b shows an example target image used to generate 
35 a set of difference parameters used by the target video 
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sequence generation unit shown in Figure 11; 

Figure 14c shows a corresponding three frames from a 
target video sequence generated by the target video 
5 sequence generation unit shown in Figure 11 from the 

three frames of the source video sequence shown in Figure 
14a using the difference parameters generated using the 
target image shown in Figure 14b; 

10 Figure 14d shows a second example of a target image used 

to generate a set of difference parameters for use by the 
target video sequence generation unit shown in Figure 11; 
and 

Figure 14e shows the corresponding three frames from the 
target video sequence generated by the target video 
sequence generation unit shown in Figure 11 when the 
three frames of the source video sequence shown in Figure 
14a are input to the target video sequence generation 
unit together with the difference parameters calculated 
using the target image shown in Figure 14d. 

Figure 1 is a block diagram showing the general 
arrangement of an image processing apparatus according to 
an embodiment of the present invention. The apparatus 
comprises a computer 1 having a central processing unit 
(CPU) 3 connected to a memory 5 which is operable to 
store a program defining the sequence of operations of 
the CPU 3 and to store object and image data used in 
calculation by the CPU 3. 

Coupled to an input port of the CPU 3 there is an input 
device 7, which in this embodiment comprises a keyboard 
and a computer mouse. Instead of, or in addition to the 
35 computer mouse, another position sensitive input device 
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(pointing device) such as a digitiser with associated 
stylus may be used. 

A frame buffer 9 is also provided and is coupled to the 
CPU 3 and comprises a memory unit (not shown) arranged to 
store image data relating to at least one image, for 
example by providing one (or several) memory location(s) 
per pixel of the image. The value stored in the frame 
buffer for each pixel defines the colour or intensity of 
that pixel in the image, in this embodiment, the images 
are represented by 2-D arrays of pixels, and are 
conveniently described in terms of cartesian coordinates, 
so that the position of a given pixel can be described by 
a pair of x-y coordinates. This representation is 
convenient since the image is displayed on a raster scan 
display 11. Therefore, the x-coordinate maps to the 
distance along the line of the display and the y- 
coordinate maps to the number of the line. The frame 
buffer 9 has sufficient memory capacity to store at least 
one image. For example, for an image having a resolution 
of 1000 x 1000 pixels, the frame buffer 9 includes 10 6 
pixel locations, each addressable directly or indirectly 
in terms of pixel coordinates x,y. 

25 In this embodiment, a video tape recorder (VTR) 13 is 

also coupled to the frame buffer 9, for recording the 
image or sequence of images displayed on the display 11. 
A mass storage device 15, such as a hard disc drive, 
having a high data storage capacity is also provided and 

30 coupled to the memory 5. Also coupled to the memory 5 is 

a floppy disc drive 17 which is operable to accept 
removable data storage media, such as a floppy disc 19 
and to transfer data stored thereon to the memory 5. The 
memory 5 is also coupled to a printer 21 so that 

35 generated images can be output in paper form, an image 
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input device 23 such as a scanner or video camera and a 
modem 25 so that input images and output images can be 
received from and transmitted to remote computer 
terminals via a data network, such as the internet. 

5 

The CPU 3, memory 5, frame buffer 9, display unit 11 and 
mass storage device 13 may be commercially available as 
a complete system, for example as an IBM compatible 
personal computer (PC) or a workstation such as the Spark 
10 station available from Sun Microsystems. 



A number of embodiments of the invention can be supplied 
commercially in the form of programs stored on a floppy 
disc 19 or other medium, or as signals transmitted over 
15 a data link, such as the internet, so that the receiving 

hardware becomes reconfigured into an apparatus embodying 
the present invention. 

In this embodiment, the computer 1 is programmed to 
20 receive a source video sequence input by the image input 

device 23 and to generate a target video sequence from 
the source video sequence using a target image. In this 
embodiment, the source video sequence is a video clip of 
an actor acting out a scene, the target image is an image 
25 of a second actor and the resulting target video sequence 

is a video sequence showing the second actor acting out 
the scene. The way in which this is achieved in this 
embodiment will now be described with reference to 
Figures 2 to 11. 

30 

Figure 2a schematically illustrates the sequence of image 
frames (f s ) making up the source video sequence. In this 
embodiment, there are 180 source image frames f s 0 to f s 179 
making up the source video sequence. In this embodiment, 
35 the frames are black and white images having 500 x 500 
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pixels, whose value indicates the luminance of the image 
at that point. Figure 2b schematically illustrates the 
target image f s l which is used to modify the source video 
sequence. In this embodiment, the target image is also 
5 a black and white image having 500 x 500 pixels, 

describing the luminance over the image. 

In this embodiment, an appearance model is generated for 
modelling the variations in the shape and grey level 

10 (luminance) appearance of the two actors' heads. In this 

embodiment, the appearance of the head and shoulders of 
the two actors is modelled. However, for simplicity, in 
the remaining description reference will only be made to 
the heads of the two actors . This appearance model is 

15 then used to generate a set of difference parameters 

which describe the main differences between the heads of 
the two actors. These difference parameters are then 
used to modify the source video sequence so that the 
actor in the video sequence looks like the second actor. 

20 The modelling technique employed in the present 

embodiment is similar to the modelling technique 
described in the paper "Active Shape Models - Their 
Training and Application" by T.F. Cootes et al, Computer 
Vision and Image Understanding, Vol. 61, No. 1, January 

25 pp 38-59, 1995, the contents of which are incorporated 

herein by reference. 

TRAINING 

In this embodiment, the appearance model is generated 
30 from a set of training images comprising a selection of 

frames from the source video sequence and the target 
image frame. In order for the model to be able to 
regenerate any head in the video sequence, the training 
images must include those frames which have the greatest 
35 variation in facial expression and 3D pose. In this 
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embodiment, seven frames (f s 3 , f s 26 / f S 34/ f S 4?r f S 98* f s m and 
f s 162 ) are selected from the source video sequence as 
being representative of the various different facial 
expressions and poses of the first actor's face in the 
5 video sequence. As shown in Figure 3, these training 

images are input to an appearance model generation unit 
31 which processes the training images in accordance with 
user input from the user interface 33, to generate the 
appearance model 35. In this embodiment, the user 
10 interface 33 comprises the display 11 and the input 

device 7 shown in Figure 1. The way in which the 
appearance model generation unit 31 generates the 
appearance model 35 will now be described in more detail 
with reference to Figures 4 to 8. 

15 

Figure 4 is a flow diagram illustrating the general 
processing steps performed by the appearance model 
generation unit 31 to generate the appearance model 35. 
As shown, there are three general steps SI, S3 and S5 . 

20 In step SI, a shape model is generated which models the 

variability of the head shapes within the training 
images. In step S3, a grey level model is generated 
which models the variability of the grey level of the 
heads in the training images. Finally, in step S5, the 

25 shape model and the grey level model are used to generate 

an appearance model which collectively models the way in 
which both the shape and the grey level varies within the 
heads in the training images. 

30 Figure 5 is a flow diagram illustrating the steps 

involved in generating the shape model in step SI of 
Figure 4. As shown, in step Sll, landmark points are 
placed on the heads in the training images (the selected 
frames from the video sequence and the target image) 

35 manually by the user via the user interface 33. In 
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particular, in step Sll, each training image is displayed 
in turn on the display 11 and the user places the 
landmark points over the head. In this embodiment, 86 
landmark points are placed over each head in order to 
5 delineate the main features in the head, e.g. the 

position of the hair line, neck, eyes, nose, ears and 
mouth. In order to be able to compare training faces, 
each landmark point is associated with the same point on 
each face. For example, landmark point LP Q is associated 

10 with the bottom of the nose and landmark point LP 6 is 

associated with the left-hand corner of the mouth • 
Figure 6 shows an example of one of the training images 
with the landmark points positioned over the head and the 
table below identifies each landmark point with its 

15 associated position on the head. 



Landmark Point 


Associated Position 


Landmark 
Point 


Associated Position 


LP, 


Left corner of left eye 


LP W 


Eye, bottom 


LP 2 


Right corner of right 
eye 


LP 45 


Eye, top 


LP 3 


Chin, bottom 


LP 46 


Eye, bottom 


LP. 


Right corner of left 
eye 


LP< 7 


Eyebrow, lower 


LP 5 


Left corner of right 
eye 


LP* 


Eyebrow, upper 


LP 6 


Mouth, left 


LP< 9 


Cheek, left 


LP 7 


Mouth, right 


LP* 


Cheek, right 


LP 8 


Nose, bottom 


LPs i 


Eyebrow, lower 


LP 9 


Nose, between eyes 


LP« 


Eyebrow, upper 


LP I0 


Upper lip, top 


LPs, 


Eyebrow, lower 


LP„ 


Lower lip, bottom 


LP 5 , 


Eyebrow, upper 


LP I2 


Neck, left, top 


LP S 5 


Eyebrow, lower 


LP„ 


Neck, right, top 


LP< ft 


Eyebrow, upper 
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10 



15 



20 



25 



Landmark Point 


Associated Position 


Landmark 
Point 


Associated Position 


LP» 


Face edge left, level 
with nose 


LP57 


Eyebrow, lower 


LP, 5 


Face edge 


LP $8 


Eyebrow, upper 


LP, 6 


Face edge right, level 
with nose 


LP 59 


Eyebrow, lower 


LP,, 


Face edge 


LP W 


Eyebrow, upper 


LP 18 


Top of head 


LP 6 . 


Eyebrow, lower 


LP,, 


Hair edge 


LP« 


Lower lip, top 


LP20 


Hair edge 


LP 63 


Centre forehead 


LP 21 


Hair edge 


LP 6 4 


Upper lip, top left 


LP a 


Hair edge 


LP„ 


Upper lip, top right 


LP* 


Hair edge 


LP« 


Lower lip, bottom right 


LP24 


Hair edge 


LP 67 


Lower lip, bottom left 


LP* 


Hair edge 


LP M 


Eye, top left 


LP24 


Hair edge 


LP 69 


Eye, top right 


LP„ 


Hair edge 


LP 70 


Eye, bottom right 


LP a 


Hair edge 


LP71 


Eye, bottom left 


LP 2 , 


Bottom, far left 


LP 72 


Eye, top left 


LPjo 


Bottom, far right 


LP73 


Eye, top right 


LP,, 


Shoulder 


LP 74 


Eye, bottom right 


LP„ 

32 


Shoulder 


LP„ 


Eye, bottom left 


LP„ 


Bottom, left 


LP 76 


Lower lip, top left 


LP M 


Bottom, middle 


LPr, 


Lower lip, top right 


LP 35 


Bottom, right 


LP 7 « 


Chin, left 


LP* 


Left forehead 


LP79 


Chin, right 


LP 37 


Right forehead 


LPgo 


Neck, left 


LP M 


Centre, between 
eyebrows 


LPs, 


Neckline, left 


LP 39 


Nose, left 


LP !2 


Neckline 


LP,o 


Nose, right 


LP„ 


Neckline, right 
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Landmark Point 


Associated Position 


Landmark 
Point 


Associated Position 


LP,, 


Nose edge, left 


LP|« 


Neck, right 


LP„ 


Nose edge, right 


LP«5 


Hair edge 


LP<j 


Eye, top 


LPe6 


Hair edge 



The result of this manual placement of the landmark 
points is a table of landmark points for each training 
imacfe, which identifies the (x,y) coordinates of each 
landmark point within the image. The modelling technique 

10 used in this embodiment works by examining the statistics 

of these coordinates over the training set. In order to 
be able to compare equivalent points from different 
images, the heads must be aligned with respect to a 
common set of axes. This is achieved, in step S13, by 

15 iteratively rotating, scaling and translating the set of 

coordinates for each head so that they all approximately 
fill the same reference frame. The resulting set of 
coordinates for each head form a shape vector (x) whose 
elements correspond to the coordinates of the landmark 

20 points within the reference frame. In other words, the 

shape and pose of each training head is represented by a 
vector (x) of the following form: 

1 0 ,J V 1 /J 1 2 ,jr 2 r 85 ^BS" 1 

25 This iterative alignment process is described in detail 

in the above paper by Cootes et al and will not be 
described in detail here. The shape model is then 
generated in step S15 by performing a principal component 
analysis (PCA) on the set of shape training vectors 

30 generated in step S13. An overview of this principal 

component analysis will now be given. (The reader is 
directed to a book by W.J. Krzanowski entitled 
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"Principles of Multivariate Analysis - A User's 
Perspective", 1998, (Oxford Statistical Science Series) 
for a more detailed discussion of principal component 
analysis . ) 

5 

A principal component analysis of a set of training data 
finds all possible modes of variation within the training 
data. However, in this case, since the landmark points 
on the training heads do not move about independently, 

10 i.e. their positions are partially correlated, most of 

the variation in the training faces can be explained by 
just a few modes of variation. In this embodiment, the 
main mode of variation between the training faces is 
likely to be the difference between the shape of the 

15 first actor's head and the shape of the second actor's 

head. The other main modes of variation will describe 
the changes in shape and pose of the first actor's head 
within the selected source video frames. The principal 
component analysis of the shape training vectors x 1 

20 generates a shape model (matrix P 8 ) which relates each 

shape vector to a corresponding vector of shape 
parameters , by : 

h 1 = p ( x l - x ) (1) 

° s 

where x 1 is a shape vector, x is the mean shape vector 
25 from the shape training vectors and b^ is a vector of 

shape parameters for the shape vector x A . The matrix P s 
describes the main modes of variation of the shape and 
pose within the training heads; and the vector of shape 
parameters (b^) for a given input head has a parameter 
30 associated with each mode of variation whose value 

relates the shape of the given input head to the 
corresponding mode of variation. For example, if the 
heads in the training images include thin heads, normal 
width heads and broad heads, then one mode of variation 
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which will be described by the shape model (P a ) will have 
an associated parameter within the vector of shape 
parameters (b 8 ) which affects, amongst other things, the 
width of an input head. In particular, this parameter 
5 might vary from -1 to +1, with parameter values near -1 

being associated with thin heads, with parameter values 
around 0 being associated with normal width heads and 
with parameter values near +1 being associated with broad 
heads - 

10 

Therefore, the more modes of variation which are required 
to explain the variation within the training data, the 
more shape parameters are required within the shape 
parameter vector b^. In this embodiment, for the 

15 particular training images used, 20 different modes of 

variation of the shape and pose must be modelled in order 
to explain 98% of the variation which is observed within 
the training heads. Therefore, using the shape model 
(P s ), the shape and pose of each head within the training 

20 images can be approximated by just 20 shape parameters. 

As those skilled in the art will appreciate, in other 
embodiments, more or less modes of variation may be 
required to achieve the same model accuracy. For 
example, if the first actor's head does not move or 

25 change shape significantly during the video sequence, 

then fewer modes of variation are likely to be required 
for the same accuracy. 

In addition to being able to determine a set of shape 
30 parameters b^ for a given shape vector x 1 , equation 1 can 

be solved with respect to x 1 to give: 

x 1 = x + p/b/ (2) 
since P S P S T equals the identity matrix. Therefore, by 
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modifying the set of shape parameters (b 8 ), within 
suitable limits, new head shapes can be generated which 
will be similar to those in the training set. 

« 

5 Once the shape model has been generated, a similar model 

is generated to model the grey level within the training 
heads. Figure 7 illustrates the processing steps 
involved in generating this grey level model. As shown, 
in step S21, each training head is deformed to the mean 

10 shape. This is achieved by warping each head until the 

corresponding landmark points coincide with the mean 
landmark points (obtained from x) depicting the shape and 
pose of the mean head. Various triangulation techniques 
can be used to deform each training head to the mean 

15 shape. The preferred way, however, is based on a 

technique developed by Bookstein based on thin plate 
splines, as described in "Principle Warps: Thin-Plate 
Splines and the Decomposition of Deformations" IEEE 
Transactions Pattern Analysis and Machine Intelligence, 

20 Vol. 11, No. 6, pp 567-585, 1989, the contents of which 

are incorporated herein by reference. 



In step S23, a grey level vector (g 1 ) is determined for 
each shape-normalised training head, by sampling the grey 

25 level value at 10,656 evenly distributed points over the 

shape-normalised head. A principal component analysis of 
these grey level vectors is then performed in step S25. 
As with the principal component analysis of the shape 
training vectors, the principal component analysis of the 

30 grey level vectors generates a grey level model (matrix 

P g ) which relates each grey level vector to a 
corresponding vector of shape parameters, by: 

Jbg 1 = V 9 l ~ 9 ) O) 
where g 1 is a grey level vector, g is the mean grey level 
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vector from the grey level training vectors and b* g is a 
vector of grey level parameters for the grey level vector 
g 1 . The matrix P g describes the main modes of variation 
of the grey level within the shape-normalised training 
5 heads. In this embodiment, 30 different modes of 

variation of the grey level must be modelled in order to 
explain 98% of the variation which is observed within the 
shape-normalised training heads. Therefore, using the 
grey level model (P g ), the grey level of each shape- 
10 normalised training head can be approximated by just 30 

grey level parameters. 

In the same way that equation 1 was solved with respect 
to xS equation 3 can be solved with respect to g 1 to 
15 give: 

gr i = g * P*b£ (4) 

since P g P g T equals the identity matrix. Therefore, by 
modifying the set of grey level parameters (b g ), within 
suitable limits, new shape-normalised grey level faces 
20 can be generated which will be similar to those in the 

training set. 

As mentioned above, the shape model and the grey level 
model are used to generate an appearance model which 

25 collectively models the way in which both the shape and 

the grey level varies within the heads of the training 
images. A combined appearance model is generated because 
there are correlations between the shape and grey level 
variations, which can be used to reduce the number of 

30 parameters required to describe the total variation 

within the training faces by performing a further 
principal component analysis on the shape and grey level 
parameters. Figure 8 shows the processing steps involved 
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in generating the appearance model using the shape and 
grey level models previously determined. As shown, in 
step S31, shape parameters (b i 8 ) and grey level 
parameters (b A g ) are determined for each training head 
from equations 1 and 3 respectively. The resulting 
parameters are concatenated and a principal component 
analysis is performed on the concatenated vectors to 
determine the appearance model (matrix P 8g ) such that: 



where c 1 is a vector of appearance parameters controlling 
both the shape and grey levels and b sg are the 
concatenated shape and grey level parameters. In this 
embodiment, 40 different modes of variation and hence 40 
appearance parameters are necessary to model 98% of the 
variation found in the concatenated shape and grey level 
parameters. As those skilled in the art will appreciate, 
this represents a considerable compression over the 86 
landmark points and the 10,656 grey level values 
originally used to describe each head. 

HEAP "REGEN ERATION 

In addition to being able to represent an input head by 
the 40 appearance parameters (c), it is also possible to 
use those appearance parameters to regenerate the input 
head. In particular, by combining equation 5 with 
equations 1 and 3 above, expressions for the shape vector 
(x A ) and for the grey level vector (g 1 ) can be determined 
as follows: 



c 



i _ 




(5) 



sg 



1 = x + Q s c 



(6) 



g J = g + Q g c (?) 
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where Q 8 is obtained from P 5g and P 3 , and Q g is obtained 
from P sg and P g (and where Q s and Q g map the value of c to 
changes in the shape and shape normalised grey level 
data). However, in order to regenerate the head, the 
5 shape- free grey level image generated from the vector g* 

must be warped to take into account the shape of the head 
as described by the shape vector x 1 . The way in which 
this warping of the shape-free grey level image is 
performed will now be described • 

When the shape-free grey level vector (g L ) was determined 
in step S23, the grey level at 10,656 points over the 
shape-free head was determined. Since each head is 
deformed to the same mean shape, these 10,656 points are 

15 extracted from the same position within each shape- 

normalised training head. If the position of each of 
these points is determined in terms of the positions of 
three landmark points, then the corresponding position of 
that point in a given face can be determined from the 

20 position of the corresponding three landmark points in 

the given face (which can be found from the generated 
shape vector x 1 ). In this embodiment, a mesh of 
triangles is defined which overlays the landmark points 
such that the corners of each triangle corresponds to one 

25 of the landmark points. Figure 9 shows the head shown in 

Figure 6 with the mesh of triangles placed over the head 
in accordance with the positions of the landmark points. 

Figure 10 shows a single point p located within the 
30 triangle formed by landmark points LP if LP 5 and LP k . The 

position of point p relative to the origin (O) of the 
reference frame can be expressed in terms of the position 
of the landmark points LP if LPj and LP k . In particular, 
the vector between the origin and the point p can be 
35 expressed by the following: 
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V = aP. + bP. + cP. (8) 

P I J K 

where a, b and c are scalar values and Pi, Pj and P k are 
the vectors describing the positions of the landmark 
points LPi, LPj and LP k . In the shape-normalised heads, 
5 the positions of the 10,656 points and the position of 

the landmark points LP are known, and therefore, the 
values of a, b and c for each of the 10,656 points can be 
determined. These values are stored and then used 
together with the positions of the corresponding landmark 
10 points in the given face (determined from the generated 

shape vector x 1 ) to warp the shape-normalised grey level 
head, thereby regenerating the head from the appearance 
parameters ( c ) . 

TARGET VIDEO SEQUE NCE GENERATION 

A description will now be given of the way in which the 
target video sequence is generated from the source video 
sequence. As shown in Figure 11, the source video 
sequence is input to a target video sequence generation 
unit 51 which processes the source video sequence using 
a set of difference parameters 53 to generate and to 
output the target video sequence. 

Figure 12 is a flow diagram illustrating the processing 
steps involved in generating these difference parameters. 
As shown, in step S41, the appearance parameters (c s ) for 
an example of the first actor's head (from one of the 
training images) and the appearance parameters (c T ) for 
the second actor's head (from the target image) are 
determined. This is achieved by determining the shape 
parameter vector (b s ) and the grey level parameter vector 
(b g ) for each of the two images and then calculating the 
corresponding appearance parameters by inserting these 
shape and grey level parameters into equation 5. In step 
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S43, a set of difference parameters are then generated by 
subtracting the appearance parameters (c s ) for the first 
actor's head from the appearance parameters (c T ) for the 
second actor's head, i.e. from: 

5 c «if = c r~ °s < 9 > 

In order that these difference parameters only represent 
differences in the general shape and grey level of the 
two actors* heads, the pose and expression on the first 
actor's head 'in the training image used in step S41 
10 should match, as closely as possible, the pose and 

expression of the second actor's head in the target 
image. Therefore, care has to be taken in selecting the 
source video frame used to calculate the appearance 
parameters in step S41. 

15 

The processing steps required to generate the target 
video sequence from the source video sequence will now be 
described in more detail with reference to Figure 13. As 
shown, in step S51, the appearance parameters (Cg 1 ) for 

20 the first actor's head in the current video frame are 

automatically calculated. The way that this is achieved 
in this embodiment will be described later. In step S53, 
the difference parameters (c dif ) are added to the 
appearance parameters for the current source head to 

25 generate: 

c 1 , - c i + c ( 10 ) 



The resulting appearance parameters (c^ 1 ) are then used 
in step S55 to regenerate the head for the current video 
frame. In particular, the shape vector (x 1 ) and the 
30 shape-normalised grey level vector (g L ) are generated 

from equations 6 and 7 using the modified appearance 
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parameters (c 1 ^) and then the shape-normalised grey 
level image generated by the grey level vector (g 1 ) is 
then warped using the 10,656 stored scalar values for a, 
b and c and the shape vector (x 1 ), in the manner 
5 described above, to regenerate the head. In this 

embodiment, since the resolution of the video frame is 
500 x 500 pixels interpolation is used to determine the 
grey level values for pixels located between the 10,656 
points. The regenerated head is then composited, in step 

10 S57, into the source video frame to generate a 

corresponding target video frame. A check is then made, 
in step S59, to determine whether or not there are any 
more source video frames. If there are then the 
processing returns to step S51 where the procedure 

15 described above is repeated for the next source video 

frame. If there are no more source video frames, then 
the processing ends. 

Figure 14 illustrates the results of this animation 
technique. In particular, Figure 14a shows three frames 
of the source video sequence, Figure 14b shows the target 
image (which in this embodiment is computer-generated) 
and Figure 14c shows the corresponding three frames of 
the target video sequence obtained in the manner 
described above. As can be seen, an animated sequence of 
the computer-generated character has been generated from 
a video clip of a real person and a single image of the 
computer-generated character. 

30 AUTOMATIC GENERATIO N OF APPEARANCE PARAMETERS 

In step S51, appearance parameters for the first actor's 
head in each video frame were automatically calculated. 
In this embodiment, this is achieved in a two-step 
process. In the first step, an initial set of appearance 

35 parameters for the head is found using a simple and rapid 
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technique. For all but the first frame of the source 
video sequence, this is achieved by simply using the 
appearance parameters (Cs 1 " 1 ) from the preceding video 
frame (before modification in step S53). As described 
5 above, the appearance parameters (c) effectively define 

the shape and grey level of the head, but they do not 
define the scale, position and orientation of the head 
within the video frame . For all but the first frame in 
the source video sequence, these also can be initially 
10 estimated to be the same as those for the head in the 

preceding frame. 

For the first frame, if it is one of the training images 
input to the appearance model generation unit 31, then 
the scale, position and orientation of the head within 
the frame will be known from the manual placement of the 
landmark points and the appearance parameters can be 
generated from the shape parameters and the shape- 
normalised grey level parameters obtained during 
training. If the first frame is not one of the training 
images, as in the present embodiment, then the initial 
estimate of the appearance parameters is set to the mean 
set of appearance parameters (i.e. all the appearance 
parameters are zero) and the scale, position and 
orientation is initially estimated by the user manually 
placing the mean face over the head in the first frame. 

In the second step, an iterative technique is used in 
order to make fine adjustments to the initial estimate of 
30 the appearance parameters. The adjustments are made in 

an attempt to minimise the difference between the head 
described by the appearance parameters (the modei head) 
and the head in the current video frame (the image head). 
With 30 appearance parameters, this represents a 
35 difficult optimisation problem. However, since each 
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attempt to match the model head to a new image head, is 
actually a similar optimisation problem, it is possible 
to learn in advance how the parameters should be changed 
for a given difference. For example, if the largest 
differences between the model head and the image head 
occur at the sides of the head, then this implies that a 
parameter that adjusts the width of the model head should 
be adjusted. 

In this embodiment, it is assumed that there is a linear 
relationship between the error (5c) in the appearance 
parameters (i.e. the change to be made) and the 
difference (51) between the model head and the image 
head, i.e. 

5c = A5I (11) 

In this embodiment, the relationship (A) was found by 
performing multiple multivariate linear regressions on a 
large sample of known model displacements (5c) and the 
corresponding difference images (61). These large sets 
of random displacements were obtained by perturbing the 
true model parameters for the images in the training set 
by a known amount. As well as perturbations in the model 
parameters, small displacements in the scale, position 
and orientation were also modelled and included in the 
regression; for simplicity of notation, the parameters 
describing scale, position and orientation were regarded 
simply as extra elements within the vector 6c. In this 
embodiment, during the training, the difference between 
the model head and the image head was determined from the 
difference between the corresponding shape normalised 
grey level vectors. In particular, for the current 
location within the video frame, the actual shape- 
normalised grey level vector g 1 was determined (in the 
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manner described above with reference to Figure 7) which 
was then compared with the grey level vector g m obtained 
from the current appearance parameters using equation 7 
above , i.e. 

5 51 = 5g = g 1 - g m (12) 

After A has been determined from this training phase , an 
iterative method for solving the optimisation problem can 
be determined by calculating the grey level difference 
vector, 5g f for the current estimate of the appearance 
10 parameters and then generating a new estimate for the 

appearance parameters from: 

c 1 = c - A5g ( 13 > 

(noting here that the vector c includes the appearance 
15 parameters and the parameters defining the current 

estimate of the scale, position and orientation of the 
head within the image) . 



ALT ERNAT IVE EMBODIMENTS 
20 As those skilled in the art will appreciate, a number of 

modifications can be. made to the above embodiment. A 
number of these modifications will now be described. 



In the above embodiment, the target image frame 
25 illustrated a computer generated head. This is not 

essential. For example, the target image might be a 
hand-drawn head or an image of a real person. Figures 
14d and 14e illustrate how an embodiment with a hand- 
drawn character might be used in character animation. In 
30 particular, Figure 14d shows a hand-drawn sketch of a 

character which when combined with the frames from the 
source video sequence (some of which are shown in Figure 
14a) generate a target video sequence, some frames of 
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which are shown in Figure 14e. As can be seen from a 
comparison of the corresponding frames in the source and 
target video frames, the hand-drawn sketch has been 
animated automatically using this technique. As those 
5 skilled in the art will appreciate, this is a much 

quicker and simpler technique for achieving computer 
animation, as compared with existing systems which 
require the animator to manually create each frame of the 
animation. In particular, in this embodiment, all that 
10 is required is a video sequence of a real life actor 

acting out the scene to be animated, together with a 
single sketch of the character to be animated. 



In the above embodiments, the head, neck and shoulders of 
15 the first actor in the video sequence was modified using 

the corresponding head, neck and shoulders from the 
target image. This is not essential- As those skilled 
in the art will appreciate, only those parts of the image 
in and around the landmark points will be modified. 

20 

Therefore, if the landmark points are only placed in and 
around the first actor's face, then only the face in the 
video sequence will be modified. This animation 
technique can be applied to any part of the body which is 

25 deformable and even to other animals and objects. For 

example, the technique may be applied to just the lips in 
the video sequence. Such an embodiment could be used in 
film dubbing applications in order to synchronise the lip 
movements with the dubbed sound. This animation 

30 technique might also be used to give animals and other 

objects human-like characteristics by combining images of 
them with a video sequence of an actor. 



35 



In the above embodiment, 86 landmark points were placed 
around the head, neck and shoulders of the test images. 
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As those skilled in the art will appreciate, more or less 
landmark points. Similarly, the number of points in the 
shape-normalised head for which a grey level value is 
sampled also depends upon the required accuracy of the 
5 system. 

In the above embodiment, the shape and grey level of the 
heads in the source video sequence and in the target 
image were modelled using principal component analysis. 

10 As those skilled in the art will. appreciate, by modelling 

the features of the heads in this way, it is possible to 
accurately model each head by just a small number of 
parameters. However, other modelling techniques, such as 
vector quantisation and wavelet techniques can be used. 

15 Furthermore, it is not essential to model each of the 

heads, however, doing so results in fewer computations 
being required in order to modify each frame in the 
source video sequence. In an embodiment where no 
modelling is performed, the difference parameters could 

20 simply be the difference between the location of the 

landmark points in the target image and in the selected 
frame from the source video sequence. It may also 
include a set of different signals indicative of the 
difference between the grey level values from the 

25 corresponding heads. 

In the above embodiment, the shape parameters and the 
grey level parameters were combined to generate the 
appearance parameters. This is not essential. A 
30 separate set of shape difference parameters and grey 

level difference parameters could be calculated however 
this is not preferred, since it increases the number of 
parameters which have to be automatically generated for 
each source video frame in step S51 described above. 
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In the above embodiments, the source video sequence and 
the target image were both black and white. The present 
invention can also be applied to colour images. In 
particular, if each pixel in the source video frames and 
5 in the target image has a corresponding red, green and 

blue pixel value, then instead of sampling the grey level 
at each of the 10,656 points in the shape-normalised 
head, the colour embodiment would sample each of the red, 
green and blue values at those points. The remaining 

10 processing steps would essentially be the same except 

that there would be a colour level model which would 
model the variations in the colour in the training 
images. Further, as those skilled in the art will 
appreciate, the way in which colour is represented in 

15 such an embodiment is not important. In particular, 

rather than each pixel having a red, green and blue 
value, they might be represented by a chrominance and a 
luminance component or by hue, saturation and value 
components. Both of these embodiments would be simpler 

20 than the red, green and blue embodiment, since the image 

search which is required during the automatic calculation 
of the appearance parameters in step S51 could be 
performed using only the luminance or value component. 
In contrast, in the red, green and blue colour 

25 embodiment, each of these terms would have to be 

considered in the image search. 

In the above embodiment, during the automatic generation 
of the appearance parameters, and in particular during 

30 the iterative updating of these appearance parameters 

using equation 13 above, the grey level value at each of 
the 10,656 points within the grey level vector obtained 
for the current location within the video frame and 
within the corresponding grey level vector obtained from 

35 the model were considered at each iteration. In an 
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alternative embodiment, the resolution employed at each 
iteration might be changed. For example, in the first 
iteration, the grey level value at 1000 points might be 
considered to generate the difference vector 6g. Then, 
5 in the second iteration, the grey level value at 3000 

points might be considered during the determination of 
the difference vector 6g. Then for subsequent iterations 
the grey level value at each of the 10,656 points could 
be considered during the determination of the different 
10 .vector 5g. By performing the search at difference 
resolutions, the convergence of the automatically 
generated appearance parameters for the current head in 
the source video sequence can be achieved more quickly. 

In the above embodiment, a single target image was used 
to modify the source video sequence. As those skilled in 
the art will appreciate, two or more images of the second 
actor could be used during the training of the appearance 
model and during the generation of the difference 
parameters. In such an embodiment, during the 
determination of the difference parameters, each of the 
target images would be paired with a similar image from 
the source video sequence and the difference parameters 
determined from each would be averaged to determine a set 
of average difference parameters. 

In the above embodiment, the difference parameters were 
determined by comparing the image of the first actor from 
one of the frames from the source video sequence with the 
30 image of the second actor in the target image. In an 

alternative embodiment, a separate image of the first 
actor may be provided which does not form part of the 
source video sequence. 

35 In the above embodiments, each of the images in the 
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source video sequence and the target image were two- 
dimensional images. The above technique could be adapted 
to work with 3D modelling and animations. In such an 
embodiment, the training data would comprise a set of 3D 
models instead of 2D images. Instead of the shape model 
being a two-dimensional triangular mesh, it would be a 
three-dimensional triangular mesh. The 3D models in the 
training set would have to be based on the same 
standardised mesh, i.e., like the 2D embodiment, they 
would each have the same number of landmark points with 
each landmark point being in the same corresponding 
position in each model. The grey level model would be 
sampled from the texture image mapped onto the three- 
dimensional triangles formed by the mesh of landmark 
points. The three-dimensional models may be obtained 
using a three-dimensional scanner which typically work 
either by using laser range-finding over the object or by 
using one or more stereo pairs of cameras . The 
standardised 3D triangular mesh would then be fitted to 
the 3D model obtained from the scanner. Once a 3D 
appearance model has been created from the training 
models, new 3D models can be generated by adjusting the 
appearance parameters, and existing 3D models can be 
animated using the same differencing technique that was 
used in the two-dimensional embodiment described above. 

In the above embodiment, the grey level vector was 
determined from the shape-normalised head of the first 
and second actors. Other types of grey level model might 
be used. For example, a profile of grey level values at 
each landmark point might be used instead of or in 
addition to the sampled grey level value across the 
object. The way in which such profiles might be 
generated and the way in which the appearance parameters 
would be automatically found during step S51 in such an 
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embodiment can be found in the above paper by Cootes et 
al and in the paper entitled "Automatic Interpretation 
and Coding of Face Images using Flexible Models" by 
Andreas Lanitis, IEEE Transactions on Pattern Analysis 
and Machine Intelligence, Vol. 19, No. 7, July 1997, the 
contents of which are incorporated herein by reference. 

During training of the above embodiment, the landmark 
points were manually placed on each of the training 
images by the user. In an alternative embodiment, an 
existing model might be used to automatically locate the 
appearance parameters on the training faces. Depending 
on the result of this automatic placement of the landmark 
points, the user may have to manually adjust the position 
of some of the landmark points. However, even in this 
case, the automatic placement of the landmark points 
would considerably reduce the time required to train the 
system. 

In the above embodiment, during the automatic 
determination of the appearance parameters for the first 
frame in the source video sequence, they were initially 
set to be equal to the mean appearance parameters and 
with the scale position and orientation set by the user. 
In an alternative embodiment, an initial estimate of the 
appearance parameters and of the scale, position and 
orientation of the head within the first frame can be 
determined from the nearest frame which was a training 
image (which, in the first embodiment, was frame f 8 3 ). 
However, this technique might not be accurate enough if 
the scale, position and/or orientation of the head has 
moved considerably between the first frame in the 
sequence and the first frame which was a training image. 
In this case, an initial estimate for the appearance 
parameters for the first frame can be the appearance 
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parameters corresponding to the training head which is 
the most similar to the head in the first frame 
(determined from a visual inspection), and an initial 
estimate of the scale, position and orientation of the 
5 head within the first frame can be determined by matching 

the head which can be regenerated from those appearance 
parameters against the first frame, for various scales, 
positions and orientations and choosing the scale, 
position and orientation which provides the best match. 

10 

In the above embodiments, a set of difference parameters 
were identified which describe the main differences 
between the actor in the video sequence and the actor in 
the target image, which difference parameters were used 

15 to modify the video sequence so as to generate a target 

video sequence showing the second actor* In the 
embodiment, the set of difference parameters were added 
to a set of appearance parameters for the current frame 
being processed. In an alternative embodiment, the 

20 difference parameters may be weighted so that, for 

example, the target video sequence shows an actor having 
characteristics from both the first and second actors. 

In the above embodiment, a target image was used to 
25 modify each frame within a video sequence of frames. In 

an alternative embodiment, the target image might be used 
to modify a single source image. In this case, the 
difference parameters might be weighted in the manner 
described above so that the resulting object in the image 
30 is a cross between the object in the source image and the 

object in the target image. Alternatively, two source 
images might be provided, with the difference parameters 
being calculated with respect to one of the source images 
which are then applied to the second source image in 
35 order to generate the desired target image. 
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CLAIMS : 

1. An image processing apparatus comprising: 

means for receiving a source image of a first 
5 object; 

means for receiving a target image of a second 
object; 

means for comparing an image of the first object 
with the image of the second object to generate a 
10 difference signal; and 

means for modifying the source image of the first 
object using said difference signal to generate a target 
image having characteristics of the first and second 
objects . 

15 

2. An image processing apparatus comprising: 

means for receiving a source animated sequence of 
frames showing a first object; 

means for receiving a target image showing a second 
20 object; 

means for comparing an image of the first object 
with the image of the second object to generate a 
difference signal; and 

means for modifying the image of the first object in 
25 each frame of said sequence of frames using said 

difference signal to generate a target animated sequence 
of frames showing the second object. 

3* An apparatus according to claim 2, wherein said 
30 first object moves within said animated sequence of 

frames, and wherein said modifying means is arranged so 
that the target animated sequence of frames shows the 
second object moving in a similar manner* 



35 



4. An apparatus according to claim 2 or 3, wherein said 
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first object deforms over the sequence of frames and 
wherein said modifying means is arranged so that the 
target animated sequence of frames shows the second 
object deforming in a similar manner. 

5 

5. An apparatus according to claim 4, wherein said 
modifying means is operable for adding said difference 
signal to the image of said first object in each frame of 
said source animated sequence of frames to generate said 

10 target animated sequence of frames. 

6. An apparatus according to any preceding claim, 
wherein said comparing means is operable to compare a 
first set of signals characteristic of the image of the 

15 first object with a second set of signals characteristic 

of the image of the second object to generate a set of 
difference signals. 

7. An apparatus according to claim 6, wherein said 
20 modifying means is operable to use said set of difference 

signals to generate said target animated sequence of 
frames . 



8. An apparatus according to claim 6 or 7, comprising 
25 processing means for processing the image of the second 

object and the image of the first object in order to 
generate said first and second sets of signals . 

9 • An apparatus according to claim 8 , further 
30 comprising model means for modelling the visual 

characteristics of the first and second objects, and 
wherein said processing means is arranged to generate 
said first and second sets of signals using said model 
means • 
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10. An apparatus according to claim 9, wherein said 
model means is operable for modelling the variation of 
the appearance of the first and second objects within the 
received frames of the source animated sequence of frames 

5 and the received target image. 

11. An apparatus according to claim 9 or 10, wherein 
said modifying means is operable (i) for determining, for 
the current frame being modified, a set of signals 

10 characteristic of the appearance of th.e first object in 

the frame using said model; (ii) to combine said set of 
signals with said difference signal to generate a set of 
modified signals; and (iii) to regenerate a corresponding 
frame using the modified set of signals and the model. 

15 

12. An apparatus according to any of claims 9 to 11, 
wherein said model means is operable for modelling the 
shape and colour of said first and second objects in said 
images • 

20 

13. An apparatus according to claim 12, wherein said 
model means is operable for modelling the shape and grey 
level of said first and second objects in said images. 

25 14. An apparatus according to claim 12 or 13, comprising 

normalisation means for normalising the shape of said 
first and second objects in said images and wherein said 
model means is operable for modelling the colour within 
the shape-normalised first and second objects. 

30 

15. An apparatus according to any of claims 9 to 14, 
further comprising training means, responsive to the 
identification of the location of a plurality of points 
over the first and second objects in a set of training 
35 images, for training said model means to model the 
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variation of the position of said points within said set 
of training images. 

16. An apparatus according to any of claims 9 to 15, 
wherein said training images include frames from the 
source animated sequence of frames and the target image. 

17. An apparatus according to claim 14, 15 or 16, 
wherein said training means is operable to perform a 
principal component analysis modelling technique on the 
set of training images for training said model means . 

18. An apparatus according to claim 17, wherein said 
training means is operable to perform a principal 
component analysis on a set of training data indicative 
of the shape of the objects within the training images 
for training said model means. 

19. An apparatus according to claim 17 or 18, wherein 
said training means is operable to perform a principal 
component analysis on a set of data describing the colour 
over the objects within the training images for training 
said model means. 

20. An apparatus according to claim 19 when dependent 
upon claim 18, wherein said training means is operable to 
perform a principal component analysis on a set of data 
obtained using a model obtained from the principal 
component analysis of the shape and the colour of the 
objects in the training images in order to train said 
model means to model both shape and colour variation 
within the objects of the training images. 

21. An apparatus according to any of claims 6 to 20, 
wherein said comparing means is operable to subtract the 
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first set of signals characteristic of the image of the 
first object from the second set of signals 
characteristic of the image of the second object in order 
to generate said set of difference signals. 

5 

22. An apparatus according to any preceding claim, 
wherein said modifying means comprises means for 
processing each frame of the source animated sequence of 
frames in order to generate a set of signals 
10 characteristic of the first object in the frame being 

processed and wherein said modifying means is operable to 
modify the set of signals for the current frame being 
processed by combining them with said difference signal. 



15 23. An apparatus according to any preceding claim, 

wherein said modifying means is arranged to modify each 
frame within the source animated sequence of frames in 
turn, in accordance with the position of the frame within 
the sequence of frames. 

20 

24. An apparatus according to any preceding claim, 
wherein said modifying means is arranged to automatically 
generate said target animated sequence from said source 
animated sequence and said difference signal. 

25 

25. An apparatus according to any preceding claim, 
wherein said image of the first object is obtained from 
a frame of said source animated sequence. 



30 26. An apparatus according to any preceding claim, 

wherein said comparing means is arranged to compare a 
plurality of images of said first object with a plurality 
of images of said second object in order to generate a 
corresponding plurality of difference signals which are 

35 combined to generate said difference signal. 
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27. An apparatus according to claim 26, wherein said 
difference signal represents the average of said 
plurality of difference signals. 

5 28. An apparatus according to any preceding claim, 

wherein the image of said first object is selected so as 
to generate a minimum difference signal. 

29. An apparatus according to any preceding claim, 
10 wherein at least one of said first and second objects 

comprises a face. 



30. An apparatus according to any preceding claim, 
wherein said target image comprises an image of a hand- 
15 drawn or a computer generated face. 



31. A graphics processing apparatus comprising: 

means for receiving a source animated sequence of 

graphics data of a first object; 
20 means for receiving a target set of graphics data of 

a second object; 

means for comparing graphics data of the first 

object with graphics data of the second object to 

generate a difference signal; and 
25 means for modifying the graphics data in the 

animated sequence of graphics data using said difference 

signal to generate a target animated sequence of graphics 

data of the second object. 



30 32. An apparatus according to claim 31, wherein said 

graphics data represents a 3D model or a 2D image. 

33. A graphics processing apparatus comprising: 

means for receiving a source animated sequence of 3D 
35 models of a first object; 



WO 00/17820 



PCT/GB99/03161 



38 

means for receiving a target 3D model of a second 
object; 

means for comparing a 3D model of the first object 
with 3D the model of the second object to generate a 
5 difference signal; and 

means for modifying each 3D model in the sequence 3D 
of models for the first object using said difference 
signal to generate a target animated sequence of 3D 
models for the second object. 

10 

34. An image processing apparatus comprising: 

means for receiving a source sequence of frames 
recording a first animated object; 

means for receiving a target image recording a 
15 second object; 

means for comparing an image of the first object 
with the image of the second object to generate a set of 
difference signals; and 

means for modifying the image of the first object in 
20 each frame of said sequence of frames using said set of 

difference signals to generate a target sequence of 
frames recording the second object animated in a similar 
manner to the animation of the first object. 

25 35. An image processing apparatus comprising: 

means for receiving a source sequence of frames 
showing a first object which deforms over the sequence of 
frames; 

means for receiving a target image showing a second 
30 object; 

means for comparing an image of the first object 
with the image of the second object to generate a 
difference signal; and 

means for modifying the image of the first object in 
35 each frame of said sequence of frames using said 
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difference signal to generate a target sequence of frames 
showing the second object deforming in accordance with 
the deformations of the first object. 

5 36. An image processing apparatus comprising: 

means for receiving a source sequence of images 
comprising a first object which deforms over the sequence 
of images; 

means for receiving a target image comprising a 

10. second object; 

means for comparing the second object in the target 
image with the first object in a selected one of said 
images from said sequence of images and for outputting a 
comparison result; 

15 means for modifying the first object in each image 

of said source sequence of images using said comparison 
result to generate a target sequence of images comprising 
said second object which deforms in a similar manner to 
the way in which said first object deforms in said source 

20 sequence of images. 

37. An apparatus for performing computer animation f 
comprising: 

means for receiving signals representative of a film 
25 of a person acting out a scene; 

means for receiving signals representative of a 
character to be animated; 

means for comparing signals indicative of the 
appearance of the person with signals indicative of the 
30 appearance of the character to generate a difference 

signal; and 

means for modifying the signals representative of 
the film using said difference signal to generate 
modified signals representative of an animated film of 
35 the character acting out said scene. 
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38. An image processing method comprising the steps of: 
receiving a source animated sequence of frames 

showing a first object; 

receiving a target image showing a second object; 
5 comparing an image of the first object with the 

image of the second object to generate a difference 
signal; and 

modifying the image of the first object in each 
frame of said sequence of frames using said difference 
10 signal to generate a target animated sequence of frames 

showing the second object. 

39. A method according to claim 38, wherein said first 
object moves within said animated sequence of frames, and 

15 wherein said modifying step is such that the target 

animated sequence of frames shows the second object 
moving in a similar manner. 

40. A method according to claim 38 or 39, wherein said 
20 first object deforms over the sequence of frames and 

wherein said modifying step is such that the target 
animated sequence of frames shows the second object 
deforming in a similar manner. 

25 41. A method according to any of claims 38 to 40, 

wherein said modifying step combines said difference 
signal with the image of said first object in each frame 
of said source animated sequence of frames to generate 
said target animated sequence of frames. 

30 

42. A method according to claim 41, wherein said 
modifying step adds said difference signal to the image 
of said first object in each frame of said source 
animated sequence of frames to generate said target 
35 animated sequence of frames. 
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43. A method according to any of claims 38 to 42 f 
wherein said comparing step compares a first set of 
signals characteristic of the image of the first object 
with a second set of signals characteristic of the image 

5 of the second object to generate a set of difference 

signals . 

44. A method according to claim 43, wherein said 
modifying step uses said set of difference signals to 

10 generate said target animated sequence of frames. 

45. A method according to claim 43 or 44, comprising the 
step of processing the image of the second object and the 
image of the first object in order to generate said first 

15 and second sets of signals. 

46. A method according to claim 45, further comprising 
the step of modelling the visual characteristics of the 
first and second objects, and wherein said processing 

20 step generates said first and second sets of signals 

using the model generated by said modelling step. 

47. A method according to claim 46, wherein said 
modelling step generates a model which models the 

25 variation of the appearance of the first and second 

objects within the received frames of the source animated 
sequence of frames and the received target image. 

48. A method according to claim 46 or 47, wherein said 
30 modifying step ( i ) determines , for the current frame 

being modified, a set of signals characteristic of the 
appearance of the first object in the frame using said 
model; (ii) combines said set of signals with said 
difference signal to generate a set of modified signals; 
35 and (iii) regenerates a corresponding frame using the 
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modified set of signals and the model. 

49. A method according to any of claims 4 6 to 48, 
wherein said modelling step generates a model which 

5 models the shape and colour of said first and second 

objects in said images. 

50. A method according to claim 49 , wherein said 
modelling step generates a model which models the shape 

10 and grey level of said first and second objects in said 

images . 

51. A method according to claim 49 or 50, comprising the 
step of normalising the shape of said first and second 

15 objects in said images and wherein said modelling step 

generates a model which models the colour within the 
shape-normalised first and second objects. 

52. A method according to any of claims 46 to 51, 
further comprising the steps of (i) identifying the 
location of a plurality of points over the first and 
second objects in a set of training images; (ii) and 
training said model to model the variation of the 
position of said points within said set of training 
images . 

53. A method according to any of claims 46 to 52, 
wherein said training images include frames from the 
source animated sequence of frames and the target image. 

54. A method according to claim 51, 52 or 53, wherein 
said training step performs a principal component 
analysis modelling technique on the set of training 
images to train said model. 



20 



25 



30 



35 
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55- A method according to claim 54, wherein said 
training step performs a principal component analysis on 
a set of training data indicative of the shape of the 
objects within the training images to train said model. 

5 

56. A method according to claim 54 or 55, wherein said 
training step performs a principal component analysis on 
a set of data describing the colour over the objects 
within the training images to train said model. 

10 

57. A method according to claim 56 when dependent upon 
claim 55, wherein said training step performs a principal 
component analysis on a set of data obtained using the 
models obtained from the principal component analysis of 

15 the shape and the colour of the objects in the training 

images in order to train said model to model both shape 
and colour variation within the objects of the training 
images • 

58. A method according to any of claims 43 to 57, 
wherein said comparing step subtracts the first set of 
signals characteristic of the image of the first object 
from the second set of signals characteristic of the 
image of the second object in order to generate said set 
of difference signals. 

59. A method according to any of claims 38 to 58, 
wherein said modifying step comprises the step of 
processing each frame of the source animated sequence of 

30 frames in order to generate a set of signals 

characteristic of the first object in the frame being 
processed and wherein said modifying step modifies the 
set of signals for the current frame being processed by 
combining them with said difference signal. 
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60. A method according to any of claims 38 to 59, 
wherein said modifying step is arranged to modify each 
frame within the source animated sequence of frames in 
turn, in accordance with the position of the frame within 

5 the sequence of frames. 

61. A method according to any of claims 38 to 60, 
wherein said modifying step automatically generates said 
target animated sequence from said source animated 

10 sequence and said difference signal. 

62. A method according to any of claims 38 to 61, 
wherein said image of the first object is obtained from 
a frame of said source animated sequence. 

15 

63. A method according to any of claims 38 to 62, 
wherein said comparing step compares a plurality of 
images of said first object with a plurality of images of 
said second object in order to generate a corresponding 

20 plurality of difference signals which are combined to 

generate said difference signal. 

64. A method according to claim 63, wherein said 
difference signal represents the average of said 

25 plurality of difference signals. 

65. A method according to any of claims 38 to 64, 
wherein the image of said first object is selected so as 
to generate a minimum difference signal. 

30 

66. A method according to any of claims 38 to 65, 
wherein at least one of said first and second objects 
comprises a face. 



35 



67. A method according to any of claims 38 to 66, 



WO 00/17820 



PCT/GB99/03161 



45 

wherein said target image comprises an image of a hand- 
drawn or a computer generated face. 

68. A graphics processing method comprising the steps 
of: 

inputting a source animated sequence of graphics 
data for a first object; 

comparing graphics data for the first object with 
graphics data for a second object to generate a 
difference signal; and 

modifying the graphics data in the animated sequence 
of graphics data using said difference signal to generate 
a target animated sequence of graphics data for the 
second object. 

69. A method according to claim 68, wherein said 
graphics data represents a 3D model or a 2D image. 

70. A graphics processing method comprising the steps 
of: 

receiving a source animated sequence of 3D models of 
a first object; 

receiving a target 3D model of a second object; 

comparing a 3D model of the first object with 3D the 
model of the second object to generate a difference 
signal; and 

modifying each 3D model in the sequence 3D of models 
for the first object using said difference signal to 
generate a target animated sequence of 3D models for the 
second object. 

71. An image processing method comprising the steps of: 
receiving a source sequence of frames showing a 

first animated object; 

receiving a target image showing a second object; 
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comparing an image of the first object with the 
image of the second object to generate a set of 
difference signals; and 

modifying the image of the first object in each 
5 frame of said sequence of frames using said set of 

difference signals to generate a target sequence of 
frames showing the second object animated in a similar 
manner to the animation of the first object. 

10 72, An image processing method comprising the steps of: 

receiving a source sequence of frames showing a 
first object which deforms over the sequence of frames; 
receiving a target image showing a second object; 
comparing an image of the first object with the 
15 image of the second object to generate a difference 

signal; and 

modifying the image of the first object in each 
frame of said sequence of frames using said difference 
signal to generate a target sequence of frames showing 
20 the second object deforming in accordance with the 

deformations of the first object . 

73. An image processing method comprising the steps of: 
receiving a source sequence of images comprising a 
25 first object which deforms over the sequence of images; 

receiving a target image comprising a second object; 
comparing the second object in the target image with 
the first object in a selected one of said images from 
said sequence of images and for outputting a comparison 
30 result; 

modifying the first object in each image of said 
source sequence of images using said comparison result to 
generate a target sequence of images comprising said 
second object which deforms in a similar manner to the 
35 way in which said first object deforms in said source 
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sequence of images. 

74. A computer animation method, comprising the steps 
of: 

5 receiving signals representative of a film of a 

person acting out a scene; 

receiving signals representative of a character to 
be animated; 

comparing signals indicative of the appearance of 
10 the person with signals indicative of the appearance of 

the character to generate a difference signal; and 

modifying the signals representative of the film 
using said difference signal to generate modified signals 
representative of an animated film of the character 
15 acting out said scene. 

75. An apparatus according to any of claims 1 to 37, 
wherein said modifying means is operable to apply a 
weighting to said difference signal and to generate said 

20 target image using said weighted difference signal, 

76. A storage medium storing processor implementable 
instructions for controlling a processor to carry out the 
method of any one of claims 38 to 74. 

25 

77. An electromagnetic or acoustic signal carrying 
processor implementable instructions for controlling a 
processor to carry out the method of any one of claims 38 
to 74. 

30 

78. Processor implementable instructions for controlling 
a processor to carry out the method of any one of claims 
38 to 74. 
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